{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CUDA C Acceleration \n",
    "Before we begin, let us execute the below cell to display information about the NVIDIA® CUDA® driver and the GPUs running on the server by running the `nvidia-smi` command. To do this, execute the cell block below by clicking on it with your mouse, and pressing Ctrl+Enter, or pressing the play button in the toolbar above. You should see some output returned below the grey cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!nvidia-smi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Copy and Compile the Serial code\n",
    "\n",
    "Before start modifying the serial code, let's make a copy of the serial code and rename it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cp ../source_code/serial/* ../source_code/cuda-c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../source_code/cuda-c && make clean && make"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run the Serial code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../source_code/cuda-c && ./cfd 64 500"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "# Start adding CUDA C constructs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, you can start modifying the C++ code and the `Makefile`:\n",
    "\n",
    "[cfd code](../source_code/cuda-c/cfd.cpp) \n",
    "\n",
    "[Makefile](../source_code/cuda-c/Makefile)\n",
    "\n",
    "Remember to **SAVE** your code after changes, before running below cells.\n",
    "\n",
    "#### Some Hints\n",
    "Check if there is any data race in your code.( More details on data race is present in the Links and resources section below)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compile and run CUDA C enabled code\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#compile for Tesla GPU (C/C++)\n",
    "!cd ../source_code/cuda-c && make clean && make"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Profile the CUDA C Code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../source_code/cuda-c && nsys profile -t nvtx,openacc,cuda --stats=true --force-overwrite true -o minicfdcudac_profile ./cfd 64 500"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can examine the output on the terminal or you can download the file and view the timeline by opening the output with the NVIDIA Nsight Systems."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Download and save the report file by holding down <mark>Shift</mark> and <mark>right-clicking</mark> [here](../source_code/cuda-c/minicfdcudac_profile.nsys-rep) then choosing <mark>save Link As</mark>. Once done, open it via the GUI."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validating the Output\n",
    "\n",
    "Make sure the error value printed as output matches that of the serial code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Recommendations for adding CUDA C\n",
    "\n",
    "After finding the hotspot function take an incremental approach to add pargmas. \n",
    "\n",
    "1) Convert files using CUDA kernels to .cu \n",
    "\n",
    "2) Ignore the initialization, finalization and I/O functions\n",
    "\n",
    "3) Cross check the output after incremental changes to check algorithmic scalability\n",
    "\n",
    "4) Start with a small problem size that reduces the execution time. \n",
    "\n",
    "\n",
    "**General tip:** Be aware of *Data Race* situation in which at least two threads access a shared variable at the same time. At least on thread tries to modify the variable. If data race happened, an incorrect result will be returned. So, make sure to validate your output against the serial version."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Links and Resources\n",
    "\n",
    "[CUDA Introduction ](https://developer.nvidia.com/blog/even-easier-introduction-cuda/)\n",
    "\n",
    "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
    "\n",
    "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
    "\n",
    "**NOTE**: To be able to see the Nsight Systems profiler output, please download the latest version of Nsight Systems from [here](https://developer.nvidia.com/nsight-systems).\n",
    "\n",
    "Don't forget to check out additional [Open Hackathons Resources](https://www.openhackathons.org/s/technical-resources) and join our [OpenACC and Hackathons Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
    "\n",
    "--- \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Licensing \n",
    "\n",
    "Copyright © 2022 OpenACC-Standard.org.  This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials may include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
